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I Abstract. The estimation of a covariance matrix from an insufficient amount of data is one of tlie most 

common problems in fields as diverse as multivariate statistics, wireless communications, signal processing, 
biology, learning theory and finance. In 1131 . a new approach to handle singular covariance matrices was 
suggested. The main idea was to use dimensionality reduction in conjunction with an average over the unitary 
^ matrices. In this paper we continue with this idea and we further consider new innovative approaches that 

O show considerable improvements with respect to traditional methods such as diagonal loading. One of the 

1 ^^ ^ methods is called the Ewens estimator and uses a randomization of the sample covariance matrix over all 

the permutation matrices with respect to the Ewens measure. The techniques used to attack this problem 
'"^ are broad and run from random matrix theory to combinatorics. 

r1 1. Introduction 

cd 

a The estimation of a covariance matrix from an insufficient amount of data is one of the most common 
I I problems in fields as diverse as multivariate statistics, wireless communications, signal processing, biology, 

learning theory and finance. For instance, the covariation between asset returns plays a crucial role in 
^-H modern finance. The covariance matrix and its inverse are the key statistics in portfolio optimization and 

^ risk management. Many recent financial innovations involve complex derivatives, like exotic options written 

on the minimum, maximum or difference of two assets, or some structured financial products, such as CDOs. 
^vq All of these innovations are built upon, or in order to exploit, the correlation structure of two or more assets. 

In the field of wireless communications, covariance estimates allows us to compute the direction of arrival 
' (DOA), which is a critical task in smart antenna systems since it enables accurate mobile location. Another 

application is in the field of biology and involves the interactions between proteins or genes in an organism 
T-H and the joint time evolution of their interactions. 

• • Typically the covariance matrix of a multivariate random variable is not known but has to be estimated 

^ ^ from the data. Estimation of covariance matrices then deals with the question of how to approximate the 

actual covariance matrix on the basis of samples from the multivariate distribution. Simple cases, where 
$^ the number of observations is much greater than the number of variables, can be dealt with by using the 

sample covariance matrix. In this case, the sample covariance matrix is an unbiased and efficient estimator 
of the true covariance matrix. However, in many practical situations we would like to estimate the covariance 
matrix of a set of variables from an insufficient amount of data. In this case the sample covariance matrix is 
singular (non-invertible) and therefore a fundamentally bad estimate. More specifically, let X be a random 
vector X = {Xi,.. . £ C"^! and assume for simplicity that X is centered. Then the true covariance 

matrix is given by 

S = E{XX*) = (cov(A„X,))i<i,,<„. (I.I) 
Consider n independent samples or realizations xi, . . . ,Xn e (^mxi form an m x n data matrix M = 
{xi, . . . ,Xn). Then the sample covariance matrix is an m x m non- negative definite matrix 

K^-MM*. (1.2) 
n 

If n — > +CX) as m tends to infinity, then the sample covariance matrix K converges (entrywise) to E almost 
surely. Whereas, as we mentioned before, in many empirical problems, the number of measurements is less 
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than the dimension {n < to), and thus the sample covariance matrix K is ahiiost always singular. Our target 
in this paper is to recover the true covariance matrix E from K under the condition n < m. 

The conventional treatment of covariance singularity artificially converts the singular sample covariance 
matrix into an invertible (positive definite) covariance by the simple expedient of adding a positive diagonal 
matrix, or more generally, by taking a linear combination of the sample covariance and an identity matrix. 
This procedure is variously called "diagonal loading" or "ridge regression" [171 IS] ■ The method is pretty 
straightforward. Consider 

aK + I31m 

as an estimate of S, where a,/3 are called loading parameters. This resulting matrix is positive definite 
(invertible) that preserves the eigenvectors of the sample covariance. The eigenvalues of aK + (31^. are a 
uniform shift of the eigenvalues of K. There are many methods in choosing the optimum loading parameters, 
see [H], [HI and [IS]. 

In Marzetta, Tucci and Simon's paper 13 , they suggested a new approach to handle singular covariance 
matrices. Let p < n be a parameter, to be estimated later, and consider the set of all p x to one-sided unitary 
matrices 

f^p,™ = {$ e C?'^™ : (1.3) 

Endow rip^m, with the Haar measure, that is, the uniform distribution on the set ilp^m- We define the 
operators 

covp(/s:) =E($*($ii'$*)$) (1.4) 

and 

invcovp(ii') =E($*($X$*)"i$) (1.5) 
where the expectation is taken with respect to the Haar measure. Surprisingly, they found that 

coYp{K) = . ((top - l)K + (to - p)Tr{K)lJl , 

(to^ — 1)to V / 

which is the same as diagonal loading. Moreover, they investigated the properties of mvcovp{K). If K is 
decomposed as K = UDU* , with D = diag((ii, . . . , c?„, 0, . . . , 0), then 

mvcovp{K) = Uimrcovp{D)U* , 

and 

invcovp(r>) = diag(Ai, . . . , A„, ^, . . . , ^). 

In other words, invcovp (i^T) preserves the eigenvectors of K, and transforms all the zero eigenvalues to a non- 
zero constant fi. They also provided the formulas to compute the A^'s and /i, and studied the asymptotic 
behavior of invcovp(Z3) using techniques from free probability. 

In this paper, we investigate new methods to estimate singular covariance matrices. In Section[2j we present 
some preliminaries in Schur polynomials that are later used in this work. In Section 3, we continue to work 
on the operator invcovp suggested in [T^. We also show that invcovp (i^T) actually has a very simple algebraic 
structure, i.e. it is a polynomial in K. A formula for computing E($($*D„<I>)'$*) is given to help to further 
study Equation (65) and Theorem 1, Section VI in |I3j. In Section 4, we consider a new approach, called 
the Ewens estimator, to estimate E. In this one, the average is taken over the set of all to x to permutation 
matrices with respect to the Ewens measure. The explicit formula for the Ewens estimator is computed by 
a combinatorial argument. In Section 5, we combine the ideas of the first two methods. We extend the 
definition of permutation matrices to get p x m unitary matrices and define two new operators 

Ke,m^p := E(vJ{V^KVj)V^y 
Ke,^^p := e(vJ{V^KVJ)+V,'^ 
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Figure 1. Young tableaux representation of the partition (5, 4, 1). 

to estimate S and respectively. We provide the explicit formula for Kg_m,p and an inductive formula to 
compute Kg^m,p- In Section 6, it is assumed that S has some special form, i.e. tridiagonal Toeplitz matrix 
or power Toeplitz matrix and we study its asymptotic behavior under Ewens estimator. In this Section, we 
also present some simulations under different methods to test the effect of the parameters. 

Notation: Throughout this paper, Is is the indicator function of a set S. We sometimes use [n] to present 
the set {1, 2, . . . , n}, and Tr{A) is the trace of a matrix A. For a vector v = {vi, . . . ,Vm), we use the Euclidean 
norm ||v||2 = I'^iP ^nd for an m x m matrix A, we use the Frobenius norm ||A|| = yiV(X4*). We 

use the notation h n to indicate that /i is a partition of the positive integer n. 

2. ScHUR Polynomials Preliminaries 

A symmetric polynomial is a polynomial P{xi, X2, ■ ■ ■ , x„) in n variables such that if any of the variables are 
interchanged one obtains the same polynomial. Formally, P is a symmetric polynomial if for any permutation 
a of the set {1,2,..., n} one has 

P{Xcr{l),Xcr{2),- ■ ■ ,Xa-{„)) = P{xi,X2, ■ ■ ■ ,Xn)- 

Symmetric polynomials arise naturally in the study of the relation between the roots of a polynomial in one 
variable and its coefhcients, since the coefficients can be given by a symmetric polynomial expressions in the 
roots. Symmetric polynomials also form an interesting structure by themselves. The resulting structures, and 
in particular the ring of symmetric functions, are of great importance in combinatorics and in representation 
theory (see for instance [7l [161 [121 [18] for more on details on this topic). 

The Schur polynomials are certain symmetric polynomials in n variables. This class of polynomials is very 
important in representation theory since they are the characters of irreducible representations of the general 
linear groups. The Schur polynomials are indexed by partitions. A partition of a positive integer n, also 
called an integer partition, is a way of writing n as a sum of positive integers. Two partitions that differ only 
in the order of their summands are considered to be the same partition. Therefore, we can always represent 
a partition A of a positive integer n as a sequence of n non-increasing and non-negative integers di such that 

n 

di — n with di > d2 > d^ > . . . > dn > 0. 

1=1 

Notice that some of the di could be zero. Integer partitions are usually represented by the so called Young's 
tableaux (also known as Ferrers' diagrams). A Young tableaux is a finite collection of boxes, or cells, arranged 
in left-justified rows, with the row lengths weakly decreasing (each row has the same or shorter length than 
its predecessor). Listing the number of boxes on each row gives a partition A of a non- negative integer n, 
the total number of boxes of the diagram. The Young diagram is said to be of shape A, and it carries the 
same information as that partition. For instance, in Figure [T] we can see the Young tableaux corresponding 
to the partition (5,4, 1) of the number 10. 

Given a partition A of n 

n = di + d2 + ■ ■ ■ + dn '■ di > d2 > ■ ■ ■ > dn > 
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Figure 2. Young tableaux representation of the partition (5,4,1) with its corresponding 
hook's lengths. 



the following functions are alternating polynomials (in other words they change sign under any transposition 
of the variables): 



,d^)ixi, . . . ,X„) 



det 



4' 

„d2 



E 



where Sn is the permutation group of the set {1,2, 
they are all divisible by the Vandermonde determinant 



A{xi, . . . ,Xn) = Yl 

l<j<fe<n 

The Schur polynomial associated to A is defined as the ratio: 

fl(di+n-l,d2+n-2,...,d„+0) (^^1 



n\ and e(cr) the sign of a. Since they are alternating. 



s\{xi,X2, ■ ■ . ,a;„) 



■ 5 ^n) 



A{xi,...,Xn) 

This is a symmetric function because the numerator and denominator are both alternating, and a polynomial 
since all alternating polynomials are divisible by the Vandermonde determinant (see [71[T21[IH] for more details 
here). For instance, 

5(2,1,1) (2^1' 2^2, 2^3) = 2:1X20:3 {Xi +X2 +X3) 

and 



5(2,2,0) (2^1, 2:2, 2:3) 



_ 22,22i22,2 

— X2 I X-^ X-^ ~r X^ X3 ~r X-^ X2 X3 

+ Xi xl X3 + Xi X2 X3. 



Another definition we need for the next Section is the so called hook length, hook(x), of a box x in Young 
diagram of shape A. This is defined as the number of boxes that are in the same row to the right of it plus 
those boxes in the same column below it, plus one (for the box itself). For instance, in Figure [2] we can see 
the hook lengths of the partition (5,4, 1). The product of the hook's length of a partition is the product of 
the hook lengths of all the boxes in the partition. 

We recommend the interested reader to consult [71 [T^l [H] for more details and examples on this topic. 



Some Properties of the invcovp Estimator 



For an Hermitian matrix K, one can decompose K ~ UDU* where U is unitary and D — diag((ii, . . . ,dm)- 
It was showed in [13] that invcovp(iir) — Uinvcov p{D)U* and invcovp(Z?) is still diagonal. Thus it is enough 
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to study the properties of invcovp(Z?). Let A{D) be the algebra generated by the matrices D and the mx m 
identity matrix /,„. By the Cayley-Hamihon Theorem it is clear that 

AiD) ^ {a„,^iD"'-^ + a,n~2D"'-^ + . . . + aiD + aolm : a, e c}. (3.1) 

We define I?m as the set of ah diagonal matrices in C™^"'. 

Lemma 3.1. Let D = diag(c?i, . . . , dm) be an ni x ni diagonal matrix in C™^™. // di ^ dj for i ^ j , then 
A{D) = 'Dm- If di = dj for some i ^ j, then 

A(D) = {diag{bi,...,b„...,b„...,bm) : bkeC}, 
the set of all diagonal matrices with the i*^ and j*^ entries equal. 

Proof. It is clear to see A{D) C On the other hand, for any B — diag(6i, . . . , bm) € 2^m, we form a 

system of hnear equations, 





/ "0 \ 






1 










\a.m-i) 







fbA /I dl d\ ... 4 

\bm) \l dm ■• 

The matrix is a Vandermonde matrix with det(y) = Jli<j(^i ~ ^j)- ^'tie matrix V is invertible by our 
assumption. Thus we can find a vector (ao, . . . , am-i) such that 

B = aolm + aiD + ... + Um-iD""^^ e A{D). 

This completes the proof. 

To prove the second part we use essentially the same approach as before. □ 
Theorem 3.2. The matrix invcovp(D) belongs to the algebra A{D). 

Proof. It has been shown in Equation (39) of [13], that if the matrix D is equal io D = diag(£>„, 0,„_„) 
where £>„ = (di, . . . , d„), then invcoVp(£>) = diag(AL(D„), /x/m-n) where Al(£)„) = (Ai,...,A„). From 
Lemma 3.1 it is enough to show that if di = dj for some i ^ then Ai = \j. 

In part B of Section VI in 13 , it was shown that 



Afc = ^ / Trlog($*A.$)# 
odk Jn„ „ 



Therefore, it follows that F{di, . . . , dn) is a symmetric function. Hence dF/ddi ~ dF/ddj. It is also known 
(Equation (77) in [T31) that 



/ Tr(ci>*7^„$)'d0 - E(-l)'4"''^S(i_fc,i.)(i?„), 

where S(i_fe,i'!) (£)») are the Schur polynomials. By linearity and continuity, F[di, . . . , d„) is symmetric. This 
completes the proof. □ 

3.3. Formulas for computing £($($*!)„$)'$*). Using Lemma 1 in [T3] we observe that 

(e($($*I?„$)'$*))^^ = U $($*i?„$)'$*d0j =^y" ^Tr($*A.$) 



' + 1, 
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Using Equations (69) and (70) from [13] we see that 



p-i 



I Tr(ci>*i^„$)A^d^ = ^(-l).- ^(^-^^^-) ;^ ,(^_^,,,)(I),0 



(3.2) 

(iV + n-(, + l))!(p-(j + l))!^(--'^^)(^")- 

Given a partition /i = (/xi, . . . , /xjv) of A^. The Schur polynomial of shape /i in the variables (di, . . . , (i„) is 
defined as 

M det(dr+^-'^)^^,, 

Sp(i?«) = S^{di, ...,dn)= ,,,n-j,n ' 

By the Murnaghan-Nakayama rule (see Corollary 7.17.5 in [l9]). 

E x("-^'^"Hp)n^l0^, (3.3) 

p=(l'-i,2'-2,...,Afiv)|-Ar 1 = 1 

where x^(p) = J2t(~^)^*'^^^ summed over all border-strip tableaux of shape n and type p and ht(r) is the 
height of a border-strip tableaux (see Section 7.17 in [TS] for more details and we will show a small example 
to compute (3.3) soon). Thus 



fe = l p=(l'-l,2'-2,...,Afiv)|-Ar i#fc ' 

fe=l fe=0 



(3.4) 



Therefore 



(3.5) 



^ l + l^^^ ^ (/ + l + n(j + l))!(p-(j + l))! 

The coefficients depend only on Dn,p and Z. Thus we are able to show E($($* Ai$)'$*) is a polynomial 
in Dn of degree Z, 

I 

fe=0 



3.3.1. Small dimensional examples: Let Aj = {N — j, V) be the partition of N with j ones. This one has a 
hook shape with N — j blocks in the row and j + 1 blocks in the column. 
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For I = 1, it was shown in ^ that 

n[n-' — 1) n[n^ — 1) 

For I = 2, p — (1'^), (1, 2), (3) h 3, we Ust all border-strip tableaux of shape Xj and type p, 





p=(l3) 


P=(l,2) 


P=(3) 


Ao = (3) 


1 


2 


3 




1 


2 2 


1 


1 


1 


Ai-(2,1) 


1 


2 


1 


3 


Does not exist. 


1 


1 




3 




2 




1 






A2 = (1,1,1) 


1 
2 
3 




1 
2 
2 




1 
1 
1 





Thus, 





P-(l') 


P-(l,2) 


P-(3) 


Ao = (3) 


1 


1 


1 


Ai = (2,l) 


2 





-1 


A2 = (1,1,1) 


1 


-1 


1 



.U^)^^ + ^^^^M + ^, %^^^? + Tr(^)^. + ^^^n^ 



SA,(i?) = 2 



3! 2 
Tr(i:))3 Tr(i:)3) ^^^^ 



3! 



ddi 



3 ' 
= -dj + Tr(i?)2 



SA.(^) 

Furthermore, 



Trior TT{D)Tr{D^) , Tr(i?3) Qs,, ^^^^^^^ ^ Tt{D)^ - Tt{D^ 



3! 



ddi 



J=0 



{2+p-j)\{n-j-iy.ds^^{D) 

{2 + n-jy.{p-j-iy. dd. 



(co + ci + C2)dj^ + (co - C2)TY{D)di + co 



Tr(D)2 + Tr(D2) 



C2- 



Ci 



where 



1 (2+p)!(n- 1)! 1 (1 +p)!(n- 2)! lp!(n-3)! 

Co = 77777^ 777 7TT> = - 7— rr- —7, C2 = 



3(2 + 7i)!(p-l)!' 



3(l + 7i)!(p-2)!' 



3n!(p-3)!' 



Finally, 



E 



($($*Z?$)2$*) = (co + ci + C2)i?2 + (co - c2)Tr(i?)D 



Co 



T,{D f + Tr( J^) _ ^^^^(^)2 ^ ^^ Tr(iJ)^-Tr(jJ^) 



8 



GABRIEL H. TUCCI AND KE WANG 



4. The Ewens Estimator 

Let Sm be the set of permutations of the set [m]. The Ewens measure is a probabihty measure in the 
set of permutations that depends on a parameter 9 > 0. In this measure, each permutation has a weight 
proportional to its total number of cycles. More specifically, for each permutation a from Sm its probability 
is equal to 

^'-"^'^^ = 9(9+1). ..(0 + 771-1)' 

where 9 > and K(a) is the number of cycles in a. The case 9 — 1 corresponds to the uniform measure. 
This measure has recently appeared in mathematical physics models (see e.g. [5] and [5]) and one has only 
recently started to gain insight into the cycle structures of such random permutations. 

Let (7 be a permutation in Sm, the corresponding permutation matrix M„ is an m x to matrix such that 
Ma('i,j) = li^s{j)- If we denote to be a 1 x m vector such that the i^^ entry is 1 and all the others entries 
are zero, then 

/ eo-(i) ~ 



which is, of course, a unitary matrix. Given the sample covariance matrix K we define the new estimator 
for E as 

Ke := E(M,KM:) (4.1) 
where the expectation is taken with respect to the Ewens measure. 

Theorem 4.1. Let K = (uij) 6e an to x to Tnatrix in C™^™. T/ien Kg = ¥.(Ma-KA'r^) is an 7n x 771 matrix 
such that the diagonal terms satisfy 

(Kelu - .^'^ a,, + — ^ -Tr(K), (4.2) 

t^ + m — 1 t' + TO— 1 

and the non-diagonal terms (i ^ j) satisfy 

(Ke)., = ^o + r77-2)(9 + 7n-l) + ' ^^^^^ + ' + "'^^ + . ^, 

1 

~ (9 + 7n - 2)(9 + 771 - 1) 
Remark 4.2. If 61 = 1, 



aik 



^ ■' kjtl 



l)a,j + (9- l)aj, + (9-l)J2 + "-^j) + J2 




ee" ee" ^ , eKe^ '^li.j^ij , „ Tr(K) — a 



Ki — a— h (3(Im ) where a = — — ^—^ — - and 



TO TO mm TO — 1 

This has already been computed in j20j. Proposition 2.2. 
If A' = D = diag(di, . . . , dm), then 



= a~i + a~, 1"''™' 

W + m— 1 9 + m — 1 



which corresponds to diagonal loading. 
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Proof. First, 



M„KM* 



K{el{l) ■■■ e;(m)) = ^ ^afe(e^(«)eUi) = {a^{i)a{j))i<i,3<r. 



=1 fe=i 



For diagonal terms, 



{K0)n = ^ Pe,m{<^)'- 



PS,m 



a-es„ 



res„,. 



<T(,) = 1 



= a, 



an 



' + m — 1 6 + m 



— E 

— Ilk 



1 



1 



' + m-l " e + m-1 



Tt{K). 



Now we compute the off-diagonal terms {K0)ij (i ^ j). For cr S iS^, if cr(i) = i and cr(j) = j, then 
(7 = («)(j)<Ji with (Ti e Sm-2 and 

If = j, a{j) = i, we erase i,j from a to obtain 0-2 € Sm-2, and 



Pe,m{(^) 



{0 + m-2){e + m-l) 



Pe,m-2icr6)- 



If fT(i) = i and (7{j) = k ^ i. j, then o" = with ct S Sm-i, and if(o') = ii'(o') + 1- Furthermore, we can 
erase j from a to get a new permutation a3{k) £ Sm-2 such that K{a3{k)) = K{a) and finally 



Pe,m{(^) 



{9 + m-2){e + m-l) 



P9,m-2i<J3{k)). 



Notice that T,a3{k)P0,m-2{(T3{k)) = 1. 

If a{i) = I ^ i,j and a{j) = j, then as above we can have a4{l) G Sm-2 such that 



e 



Pe,m{(^) = (g ^ ^ _ 2)(g + rn- and ^ Pe,™-2(CT4(0) = 1- 



<74{l) 



If cr(i) = I ^ i and tT(j) = k ^ j {k ^ I), and we exclude the case that a{i) = j, a{j) = i, we erase i,j from 
a to obtain (T5(Z, k) e «Sto_2, thus 



= oVfl -L rr, T^Pe,m-2{<J5{l,k)) and V Pe,m-2{(^5{l,k)) = 1. 

(^ + m-2)(e + m-l) ^^^^^ 
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Therefore, for i ^ j, 



= a. 



''^ {e + m-2){e + m-l) 


i9 + m-2){0 + m~l) 



crieSm-2 

X] P0,m-2{(T2) 



Cr3(fe)e'Sm-2 



e 



m — 2 



Without 1=3, k=% 



{9 + m-2){e + m~l) 



( \ 

'^ttij + {0- l)aji + ^ E ("''^ E 



□ 



5. Hybrid Method 

In this Section, we combine the ideas of the first two methods to create a third hybrid method. First, we 
extend the definition of a permutation. For an integer p < m, let 

Sp^m '■= : (T an injection from {f , 2, . . . to {1, 2, . . . m}|. 

The size of the set Sp^m is (^^p)\ and in the case p = m, Sm,m is the set of all permutations on {1,2,..., m}. 
For a e Sp^m, the associated px m matrix takes the form 



ea{2) 



where e(^(j) = ^o-(i)' • • • ' is a 1 x m row vector with the C7(z)*^ entry 1 and all others equal to zero. 



Notice 

and 

where 



VaV,' = Ip 



P: 



Pa := VjV„ = diag(pi, . . . 

= VCp' ^2 ^ f 1 if « e 
Z^y^ail)) ^ otherwise. 
1=1 ^ 



(5.1) 
(5.2) 



Next, we use the Ewens measure on the permutation sets to define a probability on the set Sp^m- For each 
c € Sp^m: consider the set 

^(7 := jo- e <Sm : a'{i,...,p} = crj. 
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In other words, fl^r is the set of all permutations in Sm whose restriction to the set {1, 2, . . . ,p} is equal to 
a. Recall that pe,„j is the Ewens measure on S„i with parameter 0. Define the probability measure on Sp yy^ 



for a e Sp^m as 



Me,m,p(o') := 



(5.3) 



Now we are ready to introduce two new operators 

Ko,^^p := ¥.{yJ{V„KVj)V„) (5.4) 

Ke,^,p := ¥.{vj {V„KVj)+V,) , (5.5) 

where [VcKVj)^ is the Moore— Penrose pscudoinverse of the matrix V^KYj . We use ife,m,p as an estimate 
for rt and Ke.m.p for f^^^. Now we show a few results on these new estimators. 

Theorem 5.1. Let A be an m x m matrix A — {uij) £ c™^™. Then Kg ^.p is an m x m matrix such that 
the diagonal entries are equal to 



§^^au, if 1 < i <p, 
e+m-i ""' if P + 1 < i < ^71- 



and the non-diagonal entries, assuming i < j (If j < i, exchange i and j in the following expression.) are 
equal to 



{K6,m,p)ij = ' 



( (9+p-l)(9+p-2) ifl<i<7<B 
(e+)n-l)(e+m-2)"*J' 1^ ^ ^ >■ ^ J ^ Pi 

(p-i)(9+p-i) ifl<i<D<7<m 



p(p-l) 



rflij, a p < i < j < m. 



y (0+m-l)(6i+m-2)"'»J' 

Remark 5.2. In the particular case that A is a diagonal matrix A = diag((ii, . . . , dm)i then 



e,m,p = ^-—^ 7D + — ^diag(di, . . . , dp, 0, . . . , 0). 

6^ + 771—1 + m — 1 



1 



-diag(0aii,a22,a33) 



For instance, if p = 1 and m = 3 then 

Remark 5.3. In the general case with p = 2 and 777 = 3 then 

1 



,3,2 



6 + 2 




9ai2 ai3 

+ l)a22 023 

032 2033/ 



Proof. Recall from Equation (5.2 1 that 



P. = VjK =diag(p?,...,p^ 



thus V^{V,AV^)V, = (pfp"a„)i<,j<„, where 



(0^ 



1 if ze{a(l),...,a(p)}, 
otherwise. 
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For the diagonal entries, if 1 < i < p, 



c^Sm.p ^=1 <y&Sm.,p,cr{l)=i 



17 \- p-1 \- 

m — 1 f + m — 1 

Go^_i,p_i (T GOm — l,p— 1 



0+p-l 



If p + 1 < i < m, 



{Ke,m,p)ii^ ^ l^0,m,p{<^){Pi)^au ^ Cii^^ ^ Id0^m,p{<y) 



+ m— 1 

For non-diagonal entries, if 1 < i < j < p, which turns out to be the most complicated case, pfPjaij is non 
zero if i,j G {o'(l), . . . , o-{p)}. Thus 

{K0^m,p)ii = dij ^ ^ M0,m,p(f)- 

<T(s)=«,<r(t)=3 

We divide the previous sum into five parts: 

(1) a{i) = i,<j{j) = j, thus if wc "erase" i,j from the sets [p] and [m], we get a new injection ai from 
[p]\{t,j} to [m]\{i,j} and if (a) = if (ai) + 2; 

(2) cr(s) = i, for some s € [p]\{i, j }, cr(i) = J, if we "erase" j from the sets [p] , [to] and consider s, i as one 
number s, we can get a new injection a2 ■ [p] U s\{i,j, s} — >■ [to] U s\{i,j, s} , and K{a) = K{u2) + 1; 

(3) a{t) = j, forsome t e [p]\{«, j}, cr(i) = i, similar to case (2) by exchanging the roles of we can 
get a new injection az with K{a) = if (0-3) + 1; 

(4) cr(s) = i, (j{t) = j,s ^ t, for some s € [p]\{«} and t e [p]\{j}) if we consider s, i as a new number s 
and t,j as a new number t, we get a new injection 1J4 : [p] U s,i\{i,j,s,t} — )• [to] U s,i\{i,j,s,t} , 
and if(cr) = if (0-4); 

(5) a{i) = j,<j{j) = i, if we "erase" i,j, we can get a new injection 0-5 : [p]\{«,j} — >■ [TO]\{*,i} and 
if (a) = if (c75) + 1. 
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Cr"l£5'ni-2,p-2 

flii^'b - 2) „ r„ ^ 



{9 + m- l){9 + m-2) 

aij9ip-2) ^ 

CI"3£5',„_2,p-2 



{p-2y + {p-2)_ 

2) 

C"4£5'„i-2,p-2 



{9 + m- l){9 + m-2) 



Me,m-2,p-2(0'5) 



6i+p-l)(6i+p-2) 
+ m-l)(6l + m-2)°" 



For 1 < i < p < j < m, we only consider two cases: s = i and s ^ i, 

(^''''"■^)'^"-^^^(0 + m-l)(^? + m-2) ^ M«,m-2,p-2(ai) 

CI"l£5'Tn-2,p-2 

/ ^ Me,m-2,p-2lO'2j 



For p < i < j < m, 



{9 + m-l){9 + m-2) 

C 2 £'Jru-2,p — 2 

^*^(6l + m-l)(6' + m-2)' 

p(p - 1) 



' + m-l)(0 + m-2)' 



□ 



Now we consider the estimate Kg^rn,p as in Equation (5.5). First we analyze the case when K is diagonal. 
Theorem 5.4. Let D — diag((ii, . . . , dp, 0, . . . , 0), then 

Ke,m..p = E(vJ{V,DVJ)+V^ = J' D+ + ^-^diag{d^\ . . . , 0, . . . , 0). 

V / 9 + m, — 1 9 + m — I 



+ m — 1 9 + m 



Proof. First we notice that Wa ■= VcDVj — (X^iLi '^i^''a{i)^''a(i))'^<^^i<P ^ diagonal matrix. For 1 < i < p, 



(W^.).. - 2.d/(e„(,)) - I Q ^oiherwise. 
1=1 



Thus 

= diag(dc,(i)lcr(i)e[„], . . . , dcr(p)lCT(p)e[ri]) 

and 

W+ = diag ((d^(i)l<,(i)g[„]) + , . . . , (d^(p)l^(p)g[„])+) . 
Next VjW^Vcr — X]r=i(^CT(i)lo-(/)e[n])''' is still a diagonal matrix where for 1 < i < 



(vTiA/+v\ -/ (c^'Tiol^WeN)^ if « e {ct(1), . . . ,(t(p)}, 
IV. W Va)u - i Q otherwise 
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Therefore Kg^rn,p is also diagonal and 

p 



1 = 1 <^'=Sm,p 



For 1 < i < n, 



di'UBh^ ifp+l<z<n. 



For n + 1 < i < m, {Kg^rn,p)ii = 0. 



□ 



Obtaining a close form expression for Equation (5.5 1 in the general case seems to be much more challenging. 
However, we are able to give an inductive formula for non-negative definite matrix K, with the help of a 
result of Kurmayya and Sivakumar's result P^. 

Theorem 5.5 (Theorem 3.2, jlO ). Let M ^ [A a] e M™^" be a block matrix, with A e C™^(""i) and 
a G C™ being written as a column vector. Let B — M* M and s = ||a|p — a*AA'^a. Then 

f{AA*)+ + s-^A+a){A+a)* -s-\A+ay 



-s-^\A+aY 



if s ^ and 

B+ = 
if s = 0, with 



{AA*)+ + \\b\WA+a){A+a)* - {A+a){A+b)* - {A+b){A+a)* -||6||2yl+a + A+b 



-\\b\\^(A+a)* + (A+b)* 
b = {A*)+{L + A+a{A+ay)-'^A+a. 



\b\\ 



For a non-negative definite matrix K, one can decompose 

f (di 

K = UDU* = 



U2 



di 



\u„J \ 

where [/ is a unitary matrix. Then 

fUa{l)\ fdi 
\Uaip)J 



di 



\ 

dm J 
\ 

dm J 



""(7(1) "<j(2) 



V(p) 



) "'^(2) 



V(p) 



where 



Let M = [Ml a] with Mi = (^^^(i) m;(2) . . • K(p^i) 
b = (Mi*)+(/ + M+a{M+a)*)-Hl+a. By Theorem 



M*M, 



'dmuT). 
and a — u 



a(p)- 



Let s = ||a||2 - a*MiM^a and 



5.5 



{M*M)+ = 



(MiMi*)+ 0' 
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6. Performance and Simulations 

In this Section, we study the performance of our estimators and we compare it with traditional methods. 
We focus on the case where the true covariance matrix has a Toephtz structure. More specificaUy, we focus 
on the foUowing two types of Toephtz matrices. 



6.1. Tridiagonal Toeplitz Matrix. Consider an m x m symmetric tridiagonal Toeplitz matrix of the form 



/I b 
b 1 b 



B 



b 1 b 
\ b IJ 

Proposition 6.1.1 ([3]). The eigenvalues and corresponding eigenvectors of B are given by 

nj 



and 



Xj = 1 + 2b cos 



( 1 ) ' (^^) , . . . , sin ( ) ] where j = 1, 2, . . . , m 



TO + 1 / ' 



We are interested in the case when B is non-negative definite and the entries of B are non-negative. Therefore, 



it is not hard to see that b should belong to the set 



1 

2cos(Tr/(m+l)) 



for this to hold. 



6.2. Power Toeplitz matrix. An m x m power Toeplitz matrix is given by 



( 1 

a 



a™"2 Q,™-3 



V 

Proposition 6.2.1. Let as before then 

(1) Aa > if and only if \a\ < 1. 

(2) det(yl„) = (1 



1 

a 



a 

1 / 



l<2,_;<7n 
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(3) For a^l, 



/I -a 

-a 1 + —a 



A -1 



\ 



—a 1 + a —a 
~a 1 J 



In particular, when m — oo, the asymptotic behavior of the eigenvalues of ^ are essentially the same as 
that of tridiagonal Toepltiz matrix. 

Proof. For (1), use induction. (2) follows directly from (1). To prove (3), use the matrix inverse formula and 
(1). □ 

For our practical purposes, we consider the case when a G [0, 1). 

6.3. Preliminaries on the asymptotic behavior of large Toeplitz matrices. We first collect some 
basic definitions and theorems regarding large Toeplitz matrices from Albrecht Bottcher and Bernd Silber- 
mann's book [3] . For an infinite Toeplitz matrix of the form 



define the symbol of matrix A to be 



/flo a_i a_2 • ■ A 

fli oq a_i 
02 ai ao . . 



■J 



n— — OO 



for 0<e <2tt. 

Let Am be the m x m principal minor of the matrix A. Given a Borel subset E G C we define the measures 



1 



3 = 1 



and 



KE) = ^ / XE{a{en)de, 
27r Jq 



(6.1) 



(6.2) 



where xe is the characteristic function of the set E and {A^ the eigenvalues of A^. The following 

classical result holds. 



Theorem 6.1 (Corollary 5.12 in ^). If a € L°° is real-valued, then the measures fim given by (6.1) converge 
weakly to the measure ji defined by (|6.2|). 



6.4. Asymptotic Behavior of Toeplitz Matrices under Ewens Estimator. For the symmetric tridi- 
agonal Toeplitz matrix B its symbol is 



a(e'^) = l + be''^ + be 



-id 



1 + 2&COS6', 
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where E [0, 27r]. By Theorem 1.2 m [4!, the spectrum of _B as m tends to infinity is supported on the 
interval [1 — 25, 1 + 2b]. On the other hand, by Theorem 4.1 we have that 



Bg E{M,BM;) 



Irr 



92 + 61 - 2 



b{0-l) 



+ 



{e + m-2)i9 + m-l) {9 + m - 2){0 + m - 1) 

2&(m-l) , J. 



(6.3) 



' + m-2){e + m- 1) 



where 



T — 



and 



If is a fixed constant greater than 1 then as m — > 00, 

bie-i] 





1 


3 


3 




3 


2\ 


1 





2 


4 




4 


3 


3 


2 





2 




4 


3 


3 


4 


4 







2 


3 


3 


4 


4 




2 





1 


V2 


3 


3 




3 


1 








6 






\ 






b 



















6 





b 






\ 






6 


V 





and 



+ m - 2)(e' + m- 1) 
6*2 + 6* - 1 



„ 4m 
T„J < ^ ^ 



(6.4) 

(0 + m-2)(^^ + m-l)"^"" ^^'^^ 

as m — > CXI. Therefore, and (1 — —)Im + — ee^ are asymptotically equivalent sequences (see Chapter 2, 
[5]) and by Theorem 2.6 in 0, 

lim = lim 
which is a rank-1 perturbation of identity matrix. Therefore, 

lim /i^" = (5i, 

m— f 00 

where is the Dirac measure at the point t. A more interesting situation happens when Q = /3to for a fixed 
constant /?. In this case, 

R -r , r , 1^ , 26 1 ^ 

' " " + (;3 + l)2'^"^ (/3 + l)2m '"^ (/3 + l)2m^'''' 

Since 



iTrM^l 

m \ (/3 + 1)2 m 



r„, 1 <A7l6m2^0 



and 



-Trl 



2b 1 



m \ (/3 + 1)2 m 



. T x\ 4&2 

(ee^ - /„) < ^m2 
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as m — >■ oo. By Lemma 2.3 in pj, the Levy metric of the empirical distributions of two m x m Hermitian 
matrix A, B satisfies 

L{^^t M^) < (-Tr(^ -B){A- B)*) "\ 

It is known (see Theorem 6, Section 4.3, [8 ) that the distribution functions /i„i converges weakly to /i if and 
only if the Levy metric i(/i„i,/^) — )• 0. Therefore 



For the matrix 



lim ^* = lim m™*'"*''"'" 



m 1 



which is still a tridiagonal Toeplitz matrix with symbol 

a{e^0) = l + 2b^J^cos{e). 

Hence the limit eigenvalue distribution is supported on the interval [l — 2&p^^, 1 + 2b ^^^^^2 ] . The Figure 
below shows the estimated density function for the spectrum as /3 changes. 



Tridiagonal Toepiitz IVlatrix m=300, b=0.3 




Figure 3. This Figure shows the density functions of the empirical spectral distribution of 
300 X 300 tridiagonal Toephtz matrix B with h = 0.3 and those of V.{M„BM*) for different 
O's. 



For the power Toeplitz matrix Aa, 



a a ^ „ cosiO) — a 

^(^ ) = 1 + 77^^ + Z^J^ = l + 2a 



Thus the spectrum of Aa as m tends to infinity is supported on [j^, T^]- 



eJ^-a e-'^-a {cos{e) - a)^ + sin^ {0) 

1-0 
1+0 

By Theorem |4.1[ one can get 

Ae = E(M,A„m;) 

_ + ^ , a{a^-ma + m-l-{e-l){a-l)) ^ ^_ 

" "+(e + m-2)(0 + m-l)^ " '"^"^ (l-a)2(0 + m-2)(0 + m-l) ^ (6.6) 

_^ ^^i J 

l-a{e + m-2){e + m-l) 
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Power Toeplitz Matrix m=300, a=0.5 




— true eigenvalue 

—9=300 

—9=500 

—9=1000 

—9=3000 



1.5 



2.5 



3.5 



Figure 4. This Figure shows the estimated density functions of the empirical spectral 
distribution of 300 x 300 power Toeplitz matrix ^0.5 and those of E(Mo-^o.5-^ct) for different 

where Jm — (hj) with diagonal entries lu — and non-diagonal entries kj = a* + + a™+^^' + 0;"'+^^^. 
In the case 9 — /3m then 



iTr ' 



m 



I- a{6 + m-2){e + m-l) 




(6.7) 



m{l — a) 



Similarly, we can show that 



For the matrix 



hm ^Ji^' = hm ^™ 



one has 

/ g Q \ ^ -, 2a(cos(6>) - a) 

Thus the limiting spectrum is supported on the interval 



(/3 + l)2l + a' (/3 + l)2l-a 

6.5. Simulations. In this Section, we present some of the simulations to test the performance of our esti- 
mators. Let Aa be an m X TO Toeplitz covariance matrix with entries a.^j = a'*^-''. Assume that we take n 
measurements and we want to recover E to the best of our knowledge. After performing the measurements we 
construct the sample covariance matrix K and proceed to recover in terms of the operators invcoVp(i4r) 
and ¥,{M„KM*). First we look at the eigenvalue distributions under invcovj, and Ewens estimators. In 
Figure [5] we can observe a realization of this experiment with a = 0.5, m = 200 and n = 150. We see that 
the eigenvalues of A^ range roughly from 1/3 to 3. For the sample covariance matrix 50 eigenvalues are 
precisely zero. Both, the inverse of invcovp and Ewens estimators give non-zero eigenvalues. The eigenvalues 
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True Covariance Matrix with m=200 and a = 0.5 



HIMIIIIIIlllllimillMTIlM 



50 100 150 200 

Sample Covariance with n = 150 



50 100 150 200 

Inverse of invcov with optimum p ( p = 45 ) MSE = 0.7420 



iilMMIMIMlllllllllll^ I^MIIIIMW— 



True Covariance Matrix with m = 200 n = 1 50 and a = 0.5 




100 150 200 

Sample Covariance with n = 150 



.l^ktaiiiiiiiiiiii [ 



50 100 

Ewens Method with Optimum 6 I 



1 50 200 

:261) MSE = 0.6607 



Figure 5. Comparison of the eigenvalue distributions of the true covariance matrix and 
sample covariance matrix and the invcov estimator vs. the Ewens estimator. 



under the inverse of invcovp {p ~ 45) range from 0.4 to 2 and those under Ewens (with 9 — 261) estimate 
from 0.6 to 2.7. Similar results were observed for other parameter values. 

In Figures |6] and [Tj we show the performance of the estimators for different values of p and 9. It was 
observed in [T^ that the estimator invcovp outperforms the more standard and classical estimator of diagonal 
loading for optimal loading parameters as in Ledoit and Wolf [11] by computing the Frobenius norm (MSE) 
ll^a — miiivcoVp(ivr)~^||2 for the different values of p and then computing ||^q — Klw\\2- The same type 
of experiments were performed on a variety of different scenarios as well. Let A^, m,p, n, K and 9 as before 
and define the functions 

f{m,n,a,p) = \\Aa - {p/m)mvcovp{Ky^\\2, 
g{m,n,a,p) = \\A^^ - {m/p)mvcovp{K)\\2, 
F{m,n,a,9) = \\A^ - E{M,KM:)\\2. 
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Forbenius Norm of invcov and inverse of invcov 




40 50 60 

Parameter p 



Figure 6. The functions / and g for m = 200, n = 150 and a ~ 0.5 as functions of p. 

Ewens Estimation of the Covariance IVIatrix 

0.9 



O 0.8 

E 

o 

^ 0.75 



CD 

^ 0.7 



0.65, 







1 1 1 1 1 1 








Optimum G = 270 with MSE = 0.6607 





































50 100 150 200 250 300 350 400 450 500 



Figure 7. The functions / and g for 771 = 200, n = 150 and a — 0.5 as functions of 9. 

We can observe how the Ewens estimator outperforms the invcovp estimator for the optimum values of p 
and 9. The next Figures show the behavior of the previous functions for different parameter values a, m, n,p 
and 9. 
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